1864 


REPORTS 


fivefold increase in recombination (Fig. 3C). 
This elevated recombination was only slightly 
reduced by CR. Finally, we observed a highly 
significant negative correlation between life 
span and rDNA recombination rate (fig. S3). 
Although these data do not exclude the pos- 
sibility that CR may mediate yeast life span 
independently of its effects on the rDNA, these 
data provide strong evidence that CR extends 
life span by suppressing rDNA recombination 
irrespective of whether S/R2 is present or 
absent. They also demonstrate that in a sir2A 
fobIA strain, Hst2 is critical for maintaining 
rDNA stability. 

Although the deletion of HST2 blocked the 
ability of CR to extend life span in the sir2A 
fobIA strain, it was formally possible that this 
was caused by toxic levels of ERCs in the 
strain, precluding alternative CR pathways 
from taking effect. Therefore, we determined 
whether HST2 could increase life span when 
overexpressed in order to test whether HST2 is 
a bona fide longevity gene (9). Consistent with 
the ability of HST2 to increase rDNA silencing 
and decrease rDNA recombination (Fig. 1 and 
fig. S1), overexpression of HST2 in W303AR5 
sir2A fobIA extended life span to the same 
extent as CR in this strain background (Fig. 
4A), as well as in a wild-type strain (fig. S4). 
No additive effect of HST2 overexpression and 
CR was observed, indicating that HST2 and 
CR extend life span of sir2A fob/A mutants 
through the same pathway (28). 

Next, we investigated whether the residual 
life-span extension seen for the hxk2A mutant (a 
mimic of intense CR) lacking S/R2 and HST2 
(Fig. 2C) was due to the activity of another 
sirtuin. As previously reported (/6), deletion of 
HSTI markedly increased rDNA recombination 
in a wild-type strain (Fig. 4B). Although de- 
leting HST3 and HST4 together has been shown 
to decrease chromosomal stability and increase 
mitotic recombination (29), we did not observe 
increased rDNA recombination in a W303AR5 
hst3A hst4A strain, although recombination in 
an hst4A single mutant is about twice as high as 
that in the wild type. Because deletion of HST1 
had the greatest effect on rDNA recombination, 
we suspected that Hstl might be the factor 
responsible for the residual life-span extension. 
This hypothesis was consistent with our finding 
that the general sirtuin inhibitor NAM com- 
pletely blocked the life-span extension of a 
sir2A fob1A strain by hxk2A (Fig. 1D) and a 
recent report that Hst1 functions in the nucleus 
with Hst2 in gene silencing (23). Whereas 
deletion of either HST3 or HST4 in this strain 
did not affect the ability of Axk2A to extend 
life span (fig. S5), deletion of HST] complete- 
ly eliminated the residual life-span extension 
provided by Axk2A in the BY4742 sir2A foblA 
Ast2A strain (Fig. 4C). 

In a previous study, the life span of a sir2A 
fob1A hstIA strain was extended by CR (19), 
leading the authors to conclude that HST/ plays 


no role in CR. Indeed, in agreement with this 
finding, we find that CR is effective in sup- 
pressing recombination of such a mutant (Fig. 
4D). However, this study implies that HST2 
underlies the CR-mediated life-span extension 
of this strain and that HST/ plays a minor role 
that is observed only in the absence of S/R2 and 
AST2. 

Our results show that HS72 is responsible 
for Sir2-independent life-span extension by CR 
and that it does so by suppressing rDNA re- 
combination, the same mechanism by which 
SIR2 extends life span. These findings high- 
light the importance of genomic stability as a 
determinant of yeast life span and raise the 
likelihood that multiple members of the sirtuin 
family in higher organisms also play critical 
roles in maintaining genomic stability and 
possibly in extending life span during times 
of adversity. 
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Structure of SARS Coronavirus 
Spike Receptor-Binding Domain 
Complexed with Receptor 


Fang Li,! Wenhui Li,? Michael Farzan,” Stephen C. Harrison'2* 


The spike protein (S) of SARS coronavirus (SARS-CoV) attaches the virus to its 
cellular receptor, angiotensin-converting enzyme 2 (ACE2). A defined receptor- 
binding domain (RBD) on S mediates this interaction. The crystal structure at 2.9 
angstrom resolution of the RBD bound with the peptidase domain of human 
ACE2 shows that the RBD presents a gently concave surface, which cradles the 
N-terminal lobe of the peptidase. The atomic details at the interface between the 
two proteins clarify the importance of residue changes that facilitate efficient 
cross-species infection and human-to-human transmission. The structure of the 
RBD suggests ways to make truncated disulfide-stabilized RBD variants for use in 


the design of coronavirus vaccines. 


The SARS coronavirus (SARS-CoV) is the 
agent of severe acute respiratory syndrome, 
which emerged as a serious epidemic in 2002 
to 2003, with over 8,000 infected cases and a 


fatality rate of ~10% (/-4). Coronaviruses, 
which are large, enveloped, positive-strand 
RNA viruses, infect a variety of mammalian 
and avian species and can cause upper res- 
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Fig. 1. The SARS-CoV spike protein RBD. (A) Domain structure of the SARS-CoV spike protein. The boundaries 
of the RBD were determined by protease digestion followed by N-terminal sequencing and mass spectrometric 
analysis of the digestion products (33). The RBM was identified from the crystal structure of RBD in complex with 
the human receptor. The fusion peptide (FP) and the two heptad repeat regions (HR-N and HR-C) of S2 have been 
identified by studies using synthetic peptides (34, 35). The transmembrane anchor and intracellular tail have 


assigned from sequence characteristics. (B) Crystal structure of the RBD (core structure in cyan and RBM in red) in 
complex of the human receptor ACE2 (green). (C) Detail of the binding interface, with side chains of three residues (Leu*’*, Asn*”9, and Thr48” from left to 
right) critical for cross-species and human-to-human transmission of SARS-CoV. (D) Sequence and secondary structures of the RBD. Helices are drawn as 
cylinders, and strands are drawn as arrows. The RBM is in red; the remainder of the RBD is in cyan. Disordered regions are shown as dashed lines (36). 


piratory, gastrointestinal, and central nervous 
system diseases (5). The large spike protein (S) 
on the virion surface mediates both cell at- 
tachment and membrane fusion (5). In the case 
of several avian and mammalian coronavi- 
ruses, S is cleaved by furin or a related pro- 
tease into Sl and S2; the former bears the 
receptor attachment site; the latter, the fusion 
activity. The structures of refolded heptad- 
repeat fragments of S2 from the mouse hep- 
atitis coronavirus (MHV) and from SARS-CoV 
(6-8) confirm earlier predictions (4) that the 
postfusion conformation has the trimer-of- 
hairpins organization characteristic of “class 1” 
fusion proteins, such as those of HIV, influen- 
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za virus, and Ebola virus (9). S on mature 
SARS-CoV virions does not appear to be 
cleaved, and the sequence that aligns with the 
MHV cleavage site lacks the essential residues 
for furin susceptibility (3, 4, 10, 11). We there- 
fore refer to the SI and S2 “regions” (/2), 
which contain 666 and 583 amino acid res- 
idues, respectively (Fig. 1A). 

Coronaviruses exploit a wide variety of 
cellular receptors (5). SARS-CoV and another 
human coronavirus, HCoV-NL63, both use as 
their receptor a cell-surface zinc peptidase, 
angiotensin-converting enzyme 2 (ACE2) 
(13, 14). The crystal structure of the ACE2 
ectodomain (/5) shows a claw-like N-terminal 
peptidase domain, with the active site at the 
base of a deep groove, and a C-terminal 
“collectrin” domain. A fragment of the S1 
region, residues 318 to 510, is sufficient for 
tight binding to the peptidase domain of ACE2 
(11, 16, 17). This fragment, the receptor- 
binding domain (RBD), is the critical determi- 
nant of virus-receptor interaction and thus of 
viral host range and tropism (/8). SARS-CoV 
isolated from patients during the 2002-2003 


epidemic, and also from milder sporadic cases 
in 2003 to 2004, appears to derive from a 
nearly identical virus circulating in palm civets 
and raccoon dogs (/9, 20). Changes in just a 
few residues in the RBD can lead to efficient 
cross-species transmission (/8, 20). The RBD 
also includes important viral-neutralizing epi- 
topes (2/—23), and it may be sufficient to raise 
a protective antibody response in inoculated 
animals. 

We expressed the SARS-CoV spike protein 
RBD, residues 306 to 575, in Sf9 cells and 
purified the fragment (24). Brief treatment 
with chymotrypsin yielded a shorter fragment, 
residues 306 to 527. Soluble ACE2, residues 
19 to 615, was expressed in Sf9 cells and 
purified as described in (24). The two com- 
ponents were mixed, and the complex was 
purified by size-exclusion chromatography on 
Superdex 200 (Amersham Biosciences, Piscat- 
away, NJ). Crystals in space group P21, a = 
82.3 A, b= 119.4 A, c = 113.2 A, B = 91.2°, 
with two complexes per asymmetric unit, were 
grown at room temperature from a mother li- 
quor containing 24% polyethylene glycol 6000, 
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150 mM NaCl, 100 mM Tris at pH 8.2, and 
10% ethylene glycol. We determined the struc- 
ture of the ACE2/SARS-CoV/RBD complex 
by molecular replacement with ACE2 as the 
search model, and we refined it at 2.9 A res- 
olution (24). The final model contains resi- 
dues 19 to 615 of the N-terminal peptidase 
domain of human ACE2 and residues 323 to 
502 (except for 376 to 381) of the RBD; as 
well as glycans N-linked to ACE2 residues 53, 
90, 322, and 546 and to RBD residue 330; and 
65 solvent molecules. The R,., is 27.5% and 
R york 18 22.1% (see table S1 for definitions). 

The ACE2 peptidase domain has two lobes 
that close toward each other after substrate 
engagement (/5). In one of the two complexes 
in the asymmetric unit of our crystals, ACE2 is 
fully open; in the other, it is slightly closed 
(fig. S1). The SARS-CoV S protein contacts 
the tip of one lobe of ACE2 (Fig. 1). It does 
not contact the other lobe, nor does it occlude 
the peptidase active site. Binding of the spike 
protein to ACE2 is not altered by the addition 
of a specific ACE2 inhibitor, which is ex- 
pected to favor the closed state (/8). Thus, 
both structural and biochemical data indicate 
that viral attachment is unaffected by the open- 
to-closed transition. 

The RBD contains two subdomains (Fig. 1): 
a core and an extended loop. The core is a 
five-stranded anti-parallel B sheet (B1 to B4 
and 87), with three short connecting a helices 
(aA to aC). There are nine cysteines in the 
chymotryptic fragment. Disulfide bonds con- 
nect cysteines 323 to 348, 366 to 419, and 467 
to 474. The remaining cysteines are disordered 
but two (378 and 511) are in the same 
neighborhood and could form a disulfide in 
the recombinant fragment, even if they have 
other partners in the intact S protein. The 
extended loop subdomain lies at one edge of 
the core; it presents a gently concave outer 
surface formed by a two-stranded B sheet (B5 
and 86). The base of this concavity cradles the 
N-terminal helix of ACE2; a ridge to one side 
of it, which is reinforced by the Cys*°7—Cys*”4 
disulfide bridge, contacts the loops between 
ACE2 helices a2 and a3; a ridge to the other 
side inserts between a short ACE2 helix (res- 
idues 329 to 333) and a B hairpin at ACE2 
residue 353 (Fig. 1C). Residues 445 to 460 of 
the RBD anchor the entire receptor-binding 
loop to the core of the RBD. We refer to this 
loop (residues 424 to 494), which makes all 
the contacts with ACE2, as the receptor- 
binding motif (RBM). 

The RBM surface is complementary to the 
receptor tip, with about 1700 A? of buried 
surface at the interface (Fig. 2A and fig. S2), 
consistent with their high affinity (dissociation 
constant K, ~ 1078) (18, 2/). A total of 18 
residues of the receptor contact 14 residues of 
the viral spike protein (Table 1). Networks of 
hydrophilic interactions, which occur largely 
among amino acid side chains, predominate. 


£& 
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Fig. 2. Features contributing to specific recognition of ACE2 by the SARS-CoV RBD. (A) Surface 
complementarity, Space-filling representation of ACE2 (in green), RBD (core structure in cyan and 
RBM in red), and the complex of ACE2 and RBD are shown. The complex buries 1700 A? at the 
binding interface. (B) Distribution of tyrosines (magenta) and cysteines (yellow) on the RBD. The 
RBM is particularly tyrosine-rich. The six tyrosines that contact ACE2 are accompanied by an 
asterisk. The three disulfide bonds link C323 to C348, C366 to C419, and C467 to C474; two are 
labeled, and the third is partly concealed by the lower corner of the B sheet. 
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Fig. 3. Residues important for species specificities of SARS-CoV. (A) Met®2 of human ACEZ2 is 
asparagine in rat ACE2, introducing a glycan that appears to interfere with infection of rat cells. (B) 
Asn479 (boldface) is present in most SARS-CoV sequences from human specimens. Lys*79, which is 
found in most sequences from palm-civet specimens, would have steric and electrostatic 
interference from residues (e.g., His?*) on the N-terminal helix of human ACE2. (C) Thr487 
(boldface) appears to enhance human-to-human transmission of SARS-CoV. The methyl group of 
Thr*87 lies in a hydrophobic pocket at the ACE2/RBD interface. On rat and mouse ACE2, residue 
353 is histidine, disfavoring viral binding. The dashed black lines indicate hydrogen bonds. 


Six RBM residues at this interface are tyro- 
sines, which present both a polar hydroxyl group 
and a hydrophobic aromatic ring (Fig. 2B). 
Coronaviruses are classified in three groups 
(5); SARS-CoV belongs to group 2 (fig. $3). 
Spike-protein sequences from several members 
of group 2 lead us to expect that all have rather 
similar structures, including the RBD core (fig. 
S3). The SARS-CoV RBM is substantially 
shorter than are the corresponding regions in 
several other group-2 viral spike proteins, how- 
ever, and it has no evident sequence similarity 
to the others (fig. S3). Thus, this extended loop 
is probably a hypervariable decoration of an 
otherwise-conserved domain. In the case of 
MHV, the receptor (murine carcinoembry- 
onic antigen cell adhesion molecule la, or 
CEACAM 1a) (25, 26) makes contact not with 
the extended-loop subdomain (nor, indeed, with 
any part of the domain homologous to the 
SARS-CoV RBD), but rather with structures 
in the N-terminal region of the spike protein 
(27). Receptors and receptor-binding regions 


of other group-2 coronaviruses have not 
been identified. The group-1 human corona- 
virus 229E receptor is aminopeptidase N; 
the corresponding RBD on its spike protein 
is known (28). 

The SARS-CoV appears to derive from a 
cross-species infection with a coronavirus 
isolated from palm civets (19, 20). S-gene 
sequences from civet and human specimens 
obtained during the 2002-to-2003 epidemic 
show that their RBDs differ at only four 
positions, residues 344, 360, 479, and 487, 
but the human viral spike protein binds the 
human receptor 10° to 10+ times more tightly 
than does its civet spike counterpart (/8). 
Residues 344 and 360 are far from the binding 
interface in the complex described here, and 
mutation to the corresponding civet CoV 
residues does not affect affinity or infectivity 
(18). The critical changes are therefore at 
positions 479 and 487, both of which lie in 
the RBD-receptor contact (Figs. 1 and 3 and 
Table 1). 
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Table 1. Contacts between ACE2 and SARS-CoV RBD. Residues in ACE2 that 
contact the RBD are listed by their position (numbers across the top of each 
column) and by their single-letter identity (36) in the palm-civet, mouse, rat, 
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and human receptors. The residues they contact in the structure described here 
and their position numbers in the spike proteins from human isolates are shown 
at the bottom of each column. 


L T T Y Q E Y Q Vv L T Y D 
N T N Q E D Y Q L T ) F T 
K ) K Q E D Y Q L | N F N 
Q T K H E D ¥ Q L L M Y N 
N473. Y475 Y475 Y440 Y491 Y436 Y484 Y436 Y484 L472 L472 N473 1402 
Y442 N479 T486 Y484 Y475 
T487 


E N K G civet ACE2 
A N H G mouse ACE2 
T N H G rat ACE2 
E N K G human ACE2 
R426 R426 1486 G488 Y491 human SARS 
T487 G488 
Y491 


The changes at these two positions are 
relatively subtle. In most viral sequences from 
palm-civet specimens, residue 479 is lysine 
and 487 is serine, whereas in SARS-CoV 
sequences from the 2002-2003 epidemic, 
these residues are asparagine and threonine, 
respectively. The presence of lysine at 479 
reduces affinity for human but not for civet 
ACE2; serine at 487 reduces affinity for both 
receptors (/8). Position 479 lies opposite the 
ACE2 N-terminal helix (a1), on which several 
residues differ in identity between civet and 
human (Table 1). Some civet coronavirus 
sequences have asparagine at position 479, 
and the difference does not appear to be 
critical for binding to the civet receptor (/8). 
At position 487 in the spike protein, replacing 
threonine (SARS-CoV) with serine (civet viral 
sequences) would remove the threonine methyl 
group, which lies in a hydrophobic pocket 
bounded by atoms in the side chains of Tyr*! 
and Lys*°? on the receptor and Tyr4** in the 
RBM (Fig. 3C). This pocket appears to be 
relatively inflexible. A main-chain hydrogen 
bond (carbonyl of ACE2 Lys**? to amide of 
RBD Gly*®) fixes the relative positions of 
receptor and spike protein quite precisely. 
Moreover, the Thr+8” rotamer is determined 
by a hydrogen bond from Oy to the main-chain 
carbonyl of Tyr4*+; the aliphatic part of the 
Lys?>3 side chain is sandwiched between the 
rings of ACE2 Tyr*! and RBD Tyr*?!, and 
the e-NH} is neutralized by ACE2 Asp**. 
Mutation to serine would thus leave a hard-to- 
fill van der Waals hole; indeed, a mutation in 
which Thr*®’ is replaced by Ser in the human 
RBD decreases affinity for human ACE2 by 
more than 20-fold (/8). Civet ACE2 is es- 
sentially identical to human ACE2 at all the 
relevant positions in the vicinity of this in- 
teraction; like the human receptor, it appears to 
bind RBDs with threonine at 487 more tightly 
than those with serine (/8). All of the more 
than 100 S-protein sequences obtained during 
the 2002-2003 SARS epidemic have threonine 
at this position, whereas all 14 such sequences 
from palm-civet and raccoon-dog isolates have 
serine (29, 30). 

Viruses from sporadic SARS cases during 
2003 to 2004, each of which was an inde- 
pendent cross-species event from which no 
human-to-human transmission occurred, all 
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had asparagine at 479 and serine at 487 
(29, 30). It is therefore plausible that a key 
factor determining severity (and possibly 
human-to-human transmission) is the presence 
or absence of a y-methyl group on the 487 side 
chain. The 2003-2004 sequences differed, 
however, at two other RBD positions from 
those sequences obtained during the epidemic 
of the previous winter: Leu*”* had changed to 
proline and Asp**° to glycine. Inspection of the 
model suggests that the leucine-to-proline 
change might have contributed to attenuation, 
by reducing the spike-receptor contact surface 
(Fig. 3A). A similar rationale is harder to find 
for the aspartate-to-glycine substitution, be- 
cause the aspartyl side chain projects into 
solution, and mutation of this residue to 
alanine has no effect on RBD binding to 
ACE2 (J6). 

Two other species differences are worth 
noting. Rat ACE2 does not support infection 
by SARS-CoV, and mouse ACE2 does so only 
inefficiently (30). At position 82, where the 
human receptor has a methionine, the rat pro- 
tein has a glycosylated asparagine; the glycan 
would disrupt by steric interference a hydro- 
phobic contact between Met®? and Leu?” in 
the RBM (Fig. 3A). At position 353, where 
the human receptor has a lysine critical for 
the contact with Thr#8” in the RBM (Fig. 3B), 
the rat receptor has histidine. Mouse ACE2 
also has histine at 353, but it does not have 
a glycosylation site at 82. It thus bears one 
but not both of the differences that render rat 
ACE2 inactive as a receptor, and mutation 
of His*°3 to lysine in mouse ACE2 allows 
high-level infection of murine cells by SARS- 
CoV (30). 

The residues singled out for description in 
the preceding paragraphs are not, of course, the 
only ones critical for the tight complementarity 
of the SARS-CoV RBD and human (or palm 
civet) ACE2. They are simply the positions at 
which there are differences among isolates and 
receptors important for binding and entry. 
Other species might in principle harbor var- 
iants of the same virus that would require 
changes at different positions to be able to 
infect human cells, and other changes in the 
civet virus might permit cross-species infection 
even in the absence of the serine-to-threonine 
mutation at position 487. The structure might 
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allow one to recognize such changes in fu- 
ture animal isolates. For example, the human 
receptor (but not the civet receptor) bears 
an N-linked glycan at position 90. Mutation 
of Asn?° to eliminate the glycan enhances 
S-protein—mediated binding and infection of 
human cells by pseudotyped lentiviruses (/8). 
The glycan faces a loop in the RBD con- 
taining residues 399 to 412. Changes in this 
loop that reduce likely interference with the 
glycan might have the same enhancing effects 
as does elimination of the glycan on the re- 
ceptor or mutation of Ser*87” to threonine on 
the S protein. 

Neutralizing antibodies against SARS-CoV 
recognize epitopes in the RBD (2/—23). For 
example, a high-affinity recombinant human 
monoclonal antibody, 80R, which is sensitive 
to mutation within the RBM, inhibits viral 
entry by blocking association of virus and re- 
ceptor (2/, 37). The soluble SARS-CoV RBD 
is therefore of potential use as an immunogen 
(23, 32). In the structure described here, the 
interface of the RBD with the receptor is very 
well defined, but the opposite face of the RBD 
is more disordered. The latter surface would 
interact with the rest of the spike protein, and it 
indeed contains the N and C termini of the 
RBD fragment as well as the disordered loop, 
residues 376 and 381. Thus, this face of the 
protein could be modified in various ways in 
the molecular engineering of a candidate vac- 
cine. The loop from 376 to 381 could probably 
be shortened and the disordered cysteines 
removed; other disulfides could be introduced 
to add stability; and the C-terminal segment 
could be used to link the RBD to an oligo- 
meric core. Of the 23 glycosylation sites on 
S, three are in the RBD. Only one (Asn3°) is 
sufficiently ordered in our structure to show 
even a single sugar, and all are well separated 
from the RBM. Glycosylation is therefore 
unlikely to interfere with potential neutralizing 
epitopes within the RBD; introduction of new 
glycosylation sites could in principle “focus” 
the antigenicity of a candidate immunogen. 
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Toward High-Resolution 
de Novo Structure Prediction 
for Small Proteins 


Philip Bradley, Kira M. S. Misura, David Baker* 


The prediction of protein structure from amino acid sequence is a grand 
challenge of computational molecular biology. By using a combination of im- 
proved low- and high-resolution conformational sampling methods, improved 
atomically detailed potential functions that capture the jigsaw puzzle—like 
packing of protein cores, and high-performance computing, high-resolution 
structure prediction (<1.5 angstroms) can be achieved for small protein 
domains (<85 residues). The primary bottleneck to consistent high-resolution 
prediction appears to be conformational sampling. 


It has been known for more than 40 years that 
the three-dimensional structures of proteins are 
completely determined by their amino acid 
sequences (/), and the prediction of protein 
structure from amino acid sequence—the “de 
novo” structure prediction problem—is a long- 
standing challenge in computational biology 
and chemistry. Although there are notable ex- 
ceptions, the majority of protein structures are 
likely to be at global free-energy minima for 
their amino acid sequences. The de novo pro- 
tein structure prediction problem hence is to 
find the lowest free-energy structure for a spec- 
ified amino acid sequence. The problem is chal- 
lenging because the size of the conformational 
space to be searched is vast (2) and because 
the accurate calculation of the free energies of 
protein conformations in solvent is difficult. 
Although there has been considerable pro- 
gress in low-resolution de novo protein struc- 
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ture prediction (3), both the accuracy and the 
reliability of the structural models produced by 
these methods is fairly low: C,-RMSDs (root 
mean square deviation of alpha-carbon co- 
ordinates after optimal superposition) of ~4 A 
with incorrect packing of the amino acid side 
chains. Achieving higher resolution requires 
both more physically realistic energy functions 
and better conformational searching; the prob- 
lem is difficult because the more realistic the 
energy function, the more rugged the land- 
scape, and thus the more difficult it is to 
search. Here, we show that high-resolution de 
novo structure prediction can be achieved by 
generating structurally diverse populations of 
low-resolution models and refining these 
structures in the context of a physically real- 
istic all-atom energy function. 

Critical to high-resolution structure predic- 
tion is a force field for which native structures 
are low in free energy compared with non-native 
structures and a refinement protocol that can 
efficiently navigate the corresponding free- 
energy landscape. We have developed an all- 
atom force field (4) that focuses on short-range 
interactions—primarily van der Waals packing, 
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hydrogen bonding, and desolvation—while 
neglecting long-range electrostatics. The high- 
resolution refinement protocol (5, 6) is designed 
to search in the local neighborhood of a starting 
model for low-energy structures. The protocol 
consists of multiple rounds of Metropolis Monte 
Carlo with minimization (7); each trial consists 
of a random perturbation of one or several 
backbone torsion angles, fast side-chain opti- 
mization using a rotamer representation (8, 9), 
and a gradient-based minimization of the ener- 
gy function with respect to backbone and side- 
chain torsion angles. In this way, the continuous 
space of backbone conformations and the dis- 
crete set of side-chain packing arrangements 
are searched simultaneously. Details on the en- 
ergy function and methods are provided in (/0). 

Figure 1 and fig. S1 illustrate the challenge 
of high-resolution de novo structure prediction. 
All-atom refinement trajectories begun at the 
native state produce models (refined natives) 
that sample a deep near-native free-energy 
basin. Although these structures typically have 
lower all-atom energies than do non-native 
structures, Rosetta de novo models—built from 
an extended-chain starting conformation—do 
not sample close enough to the native structure 
to fall into this narrow energy well during all- 
atom refinement. The narrow widths of the 
native basins reflect the fact that nativelike 
side-chain packing can be disrupted by even 
relatively small backbone perturbations. Thus, 
the critical step in high-resolution structure 
prediction is generating low-resolution models 
that are within the “radius of convergence” of 
the native free-energy minimum using the all- 
atom refinement protocol. This is challenging, 
because the low-resolution search integrates out 
the side-chain degrees of freedom to smooth 
the energy landscape and hence lacks the detail 
necessary to reliably discriminate nativelike 
models, leading to false minima. We attempt 
to overcome this problem by generating low- 
resolution models for a large number of se- 
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